The Coronavirus is an infectious disease that can have terminal effects.1

This analysis uses two data sets from the NY Times github repository 2, as well as census data from the RStudio package tidycensus3.

This project is currently under construction, but until everything is up and running smoothly, check out my previous version of this project below:

Project Updated: 2022-05-13

Data Updated: 2022-05-12

Background and Data

COVID-19 is causing havoc in Oregon once again, and as numbers continue to spike, I decided to revisit one the of the first projects I worked on in R Studio. That project can be seen here, and used data from Johns Hopkins. However, because that data is no longer updated this investigation will use data from the NY times that has more current data. The repository for the NY Times data can be found here, and the datasets that are being included are :

  1. us.states : state level data (file description here)

  2. us.counties : county-level data (file description here)

  3. vacc: state level COVID-19 daily vaccination numbers time series data from the Johns Hopkins University repository (file description here)

Here is the data :

us.states <- read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv')
us.counties <- read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv')
vacc <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/master/data_tables/vaccine_data/us_data/time_series/people_vaccinated_us_timeline.csv")

The projected cited above mainly looked at the case, death, and vaccination numbers per state to compare highly and mildly impacted states. In this project I will look at highly impacted states and the counties of Oregon. Additionally this project will use population and density data from the tidycensus package. I discuss more about how I got this data using an API in a blog post here. Note that this data is from 2019, which is a couple years older than the COVID data.

# tidycensus
# State : POP and DENSITY data
state.pop <- get_estimates(geography = "state", year = 2019, variable =  "POP") %>% rename ("state" = NAME, "population" = value)
state.den <- get_estimates(geography = "state", year = 2019, variable =  "DENSITY") %>% rename ("state" = NAME, "density" = value)
# OREGON : POP and Density data
or.county.pop <- get_estimates(geography = "county", state = "OR", year = 2019, variable = "POP") %>% rename ("county" = NAME, "population" = value)
or.county.den <- get_estimates(geography = "county", state = "OR", year = 2019, variable = "DENSITY") %>% rename ("county" = NAME, "density" = value)

Wrangling the Data

The COVID data set is already in long form (meaning the dates are in rows instead of columns), and the date is already saved as a date variable. Therefore the main tasks here are to join the us.states data set with the vaccination records, and population estimates. Then join the us.states.vacc with the population data and create new percentage columns.

## # A tibble: 5 × 12
##   date       state      population density cases     case.per deaths perc.deaths
##   <date>     <chr>           <dbl>   <dbl> <dbl>        <dbl>  <dbl>       <dbl>
## 1 2020-01-21 Washington    7614893    115.     1 0.000000131       0           0
## 2 2020-01-22 Washington    7614893    115.     1 0.000000131       0           0
## 3 2020-01-23 Washington    7614893    115.     1 0.000000131       0           0
## 4 2020-01-24 Illinois     12671821    228.     1 0.0000000789      0           0
## 5 2020-01-24 Washington    7614893    115.     1 0.000000131       0           0
## # … with 4 more variables: full.vacc <dbl>, full.vacc.perc <dbl>,
## #   part.vacc <dbl>, part.vacc.perc <dbl>

Wrangling the Data

The COVID data set is already in long form (meaning the dates are in rows instead of columns), and the date is already saved as a date variable. Therefore the main tasks here are to join the us.states data set with the vaccination records, and population estimates. Then join the us.states.vacc with the population data and create new percentage columns.

## # A tibble: 5 × 12
##   date       state      population density cases     case.per deaths perc.deaths
##   <date>     <chr>           <dbl>   <dbl> <dbl>        <dbl>  <dbl>       <dbl>
## 1 2020-01-21 Washington    7614893    115.     1 0.000000131       0           0
## 2 2020-01-22 Washington    7614893    115.     1 0.000000131       0           0
## 3 2020-01-23 Washington    7614893    115.     1 0.000000131       0           0
## 4 2020-01-24 Illinois     12671821    228.     1 0.0000000789      0           0
## 5 2020-01-24 Washington    7614893    115.     1 0.000000131       0           0
## # … with 4 more variables: full.vacc <dbl>, full.vacc.perc <dbl>,
## #   part.vacc <dbl>, part.vacc.perc <dbl>
## # A tibble: 10 × 8
##    date       county     population density cases cases.perc deaths deaths.perc
##    <date>     <chr>           <dbl>   <dbl> <dbl>      <dbl>  <dbl>       <dbl>
##  1 2020-02-28 Washington     601592   831.      1 0.00000166      0           0
##  2 2020-02-29 Washington     601592   831.      1 0.00000166      0           0
##  3 2020-03-01 Washington     601592   831.      2 0.00000332      0           0
##  4 2020-03-02 Washington     601592   831.      2 0.00000332      0           0
##  5 2020-03-03 Washington     601592   831.      2 0.00000332      0           0
##  6 2020-03-04 Washington     601592   831.      2 0.00000332      0           0
##  7 2020-03-05 Washington     601592   831.      2 0.00000332      0           0
##  8 2020-03-06 Washington     601592   831.      2 0.00000332      0           0
##  9 2020-03-07 Jackson        220944    79.4     2 0.00000905      0           0
## 10 2020-03-07 Klamath         68238    11.5     1 0.0000147       0           0

Using the data

Looking at States

To begin lets look at the country as a whole, by state. The data will be filtered for 2022-05-12, and then lets look at the top five states with :

Highest percentage of cases

Highest Percent of Cases
By State as of 2022-05-12
State Total Population Cases Percentage
Rhode Island 1,059,361 380,384 35.91%
Alaska 731,545 254,467 34.78%
North Dakota 762,062 242,222 31.79%
Kentucky 4,467,673 1,344,784 30.10%
Utah 3,205,958 939,092 29.29%

Most number of deaths

Most Deaths
By State as of 2022-05-12
State Total Population Deaths Percentage
California 39,512,223 90,818 0.23%
Texas 28,995,881 88,425 0.30%
Florida 21,477,737 74,158 0.35%
New York 19,453,561 67,913 0.35%
Pennsylvania 12,801,989 44,814 0.35%

Highest percentage of deaths

Highest percent of Deaths
By State as of 2022-05-12
State Total Population Deaths Percentage
Mississippi 2,976,149 12,457 0.42%
Arizona 7,278,717 30,230 0.42%
Alabama 4,903,185 19,623 0.40%
West Virginia 1,792,147 6,893 0.38%
Tennessee 6,829,174 25,988 0.38%

Most people fully vaccinated

Most People Vaccinated
By State as of 2022-05-12
State Total Population People Fully Vaccinated Percentage
California 39,512,223 28,593,661 72.37%
Texas 28,995,881 17,871,703 61.64%
New York 19,453,561 15,000,019 77.11%
Florida 21,477,737 14,417,209 67.13%
Pennsylvania 12,801,989 8,791,408 68.67%

Highest percentage of population fully vaccinated

Highest Percent of People Fully Vaccinated
By State as of 2022-05-12
State Total Population People Fully Vaccinated Percentage
District of Columbia 705,749 676,066 95.79%
Puerto Rico 3,193,694 2,683,822 84.04%
Rhode Island 1,059,361 876,943 82.78%
Vermont 623,989 507,556 81.34%
Maine 1,344,212 1,072,464 79.78%

Looking at Oregon Counties

Next to look at the data a little closer to home, for Oregon Counties. Initially filtering by the most recent date, which as of this being written is 2022-05-12`, looking at a graph of the state as a whole, and then look at the top five Oregon counties.

Most number of cases

Most Cases
Oregon Counties as of 2022-05-12
County Total Population Cases Percentage
Multnomah 812,855 123,448 15.19%
Washington 601,592 90,808 15.09%
Marion 347,818 70,656 20.31%
Clackamas 418,187 63,625 15.21%
Lane 382,067 59,691 15.62%

Highest percentage of Cases

Highest Percentage of Cases
Oregon Counties as of 2022-05-12
County Total Population Cases Percentage
Jefferson 24,658 7,315 29.67%
Umatilla 77,950 22,532 28.91%
Malheur 30,571 8,251 26.99%
Morrow 11,603 2,980 25.68%
Crook 24,404 6,153 25.21%

Most number of Deaths

Most Deaths
Oregon Counties as of 2022-05-12
County Total Population Deaths Percentage
Multnomah 812,855 1,206 0.15%
Marion 347,818 721 0.21%
Clackamas 418,187 632 0.15%
Washington 601,592 594 0.10%
Lane 382,067 540 0.14%

Highest percentage of Deaths

Highest Percent of Deaths
Oregon Counties as of 2022-05-12
County Total Population Deaths Percentage
Harney 7,393 38 0.51%
Josephine 87,487 339 0.39%
Jefferson 24,658 92 0.37%
Lake 7,869 29 0.37%
Douglas 110,980 396 0.36%

  1. World Health Organization Coronavirus disease (COVID-19). link. 2022.↩︎

  2. New York Times Covid 19 Data.↩︎

  3. tidycensus.↩︎